home *** CD-ROM | disk | FTP | other *** search
-
-
-
- Sorting and Searching
- ================================
-
- In the world of computer science perhaps no other tasks are
- more fundamental or as extensively analyzed as those of sorting
- and of searching. These routines are used in virtually all
- database programs, as well as in compilers, interpreters, and
- operating systems. This lesson introduces the basics of sorting
- and searching information. Because sorting data generally makes
- searching the data easier and faster, sorting is discussed first.
-
- Sorting is the process of arranging a set of similar informa-
- tion into an increasing or decreasing order. Specifically, given
- a sorted list a of n elements,
-
- a[1] <= a[2] <= ... <= a[n]
-
- Though ANSI C supplies the standard qsort() function as part
- of the standard library, the study and understanding of sorting
- is important for three main reasons. First, a generalized
- function like qsort() cannot be applied to all situations.
- Second, because qsort() is parameterized to operate on a wide
- variety of data, it will run slower than a similar sort that
- operates on only one type of data. (The generalization process
- inherently increases run time because of the extra processing
- time needed to handle various data types.) Finally, the Quicksort
- algorithm (used by qsort()), although very good for the general
- case, may not be the best type of sort for specialized situations.
-
- Generally, when information (such as a mailing list) is
- sorted, only a portion of that information is used as the sort
- key. This key is used in comparisons, but when an exchange is
- made the entire data structure is swapped. In a mailing list, for
- example, the ZIP code field might be used as the key, but the
- entire address is sorted. For the sake of simplicity while
- developing the various sorting methods, you will be sorting
- character arrays. Later, you will learn how to adapt any of these
- methods to any type of data structure.
-
- The three general methods that can be used to sort arrays
- are:
-
- By exchange
-
- By selection
-
- By insertion
-
- To understand these three methods, imagine a deck of cards.
- To sort the cards using exchange, you would spread the cards,
- face up, on a table and then proceed to exchange out-of-order
- cards until the deck is ordered.
-
- To sort by selection, you would spread the cards on the
- table, select the lowest-value card, take it out of the deck, and
- hold it in your hand. Then you would select from the remaining
- cards on the table the lowest card and place it behind the one
- already in your hand. This process would continue until all the
- cards were in your hand. Because you always select the lowest
- card from those remaining on the table, the cards in your hand
- would be sorted when the process was complete.
-
- To sort the cards using insertion, you would hold the cards
- in your hand and would take one at a time. As you took cards from
- the deck, you would place them into a new deck on the table,
- always inserting them in the correct position. The deck would be
- sorted when you had no cards in your hand.
-
- Many different algorithms exist for each of the three sorting
- methods. Each algorithm has its merits, but the general criteria
- for judging a sorting algorithm are based on the following
- questions:
-
- How fast can it sort information in an average case?
-
- How fast is its best and worst case?
-
- Does it exhibit natural or unnatural behavior?
-
- Does it rearrange elements with equal keys?
-
- How fast a particular algorithm sorts is of great concern.
- The speed with which an array can be sorted is directly related
- to the number of comparisons and the number of exchanges
- (exchanges take more time). A comparison occurs when one array
- element is compared to another. An exchange happens when two
- elements are swapped in the array. Later in this chapter you will
- see that some sorts require an exponential amount of time per
- element to sort and some require logarithmic time.
-
- The best- and worst-case run times are important if you
- expect to encounter one of these situations frequently. Often a
- sort will have a good average case, but a terrible worst case.
-
- A sort is said to exhibit natural behavior if it works least
- when the list is already in order, harder as the list becomes
- less ordered, and hardest when a list is in inverse order. How
- hard a sort works is based on the number of comparisons and moves
- that must be executed.
-
- To understand the importance of rearranging elements with
- equal keys, imagine a database that is sorted on a main key and a
- subkey - for example, a mailing list with the ZIP code as the main
- key and the last name within the same ZIP code as the subkey.
- When a new address is added to the list, and the list sorted
- again, you do not want the subkeys (that is, last names within
- ZIP codes) to be rearranged. To guarantee this, a sort must not
- exchange main keys of equal value.
-
- In the following sections, representative sorts from each
- class of sorting algorithms are analyzed to judge their
- efficiency. Later, improved sorting methods are studied.
-
-
- The Bubble Sort
- ~~~~~~~~~~~~~~~
- The best-known (and most infamous) sort is the Bubble sort.
- Its popularity is derived from its catchy name and its simplicity.
- For reasons that will become evident, this is one of the worst
- sorts ever conceived.
-
- The Bubble sort uses the exchange method of sorting. The
- general concept behind the Bubble sort is the repeated
- comparisons and, if necessary, exchanges of adjacent elements.
- Its name comes from the method's similarity to bubbles in a tank
- of water, where each bubble seeks its own level. In this simplest
- form of the Bubble sort:
-
- /* bubble sort */
- void bubble(char item[], int count)
- {
- int i, k;
- char t;
-
- for (i = 1; i < count; ++i)
- for (k = count-1; k >= i; --k) {
- if (item[k-1] > item[k]) {
- /* exchange elements */
- t = item[k-1];
- item[k-1] = item[k];
- item[k] = t;
- }
- }
- }
-
- 'item' is a pointer to the character array to be sorted and
- 'count' is the number of elements in the array.
-
- The Bubble sort is driven by two loops. Given that there are
- 'count' elements in the array, the outer loop causes the array to
- be scanned 'count-1' times. This ensures that, in the worst case,
- every element is in its proper position when the function
- terminates. The inner loop performs the actual comparisons and
- exchanges. (A slightly optimized version of the Bubble sort will
- terminate if no exchanges occur, but this also adds another
- comparison to each pass through the inner loop.)
-
- This version of the Bubble sort can be used to sort a
- character array into ascending order. For example, this short
- program sorts a string typed in from the keyboard:
-
- #include <stdio.h>
- #include <string.h>
-
- /* function prototypes */
- void bubble(char [], int);
-
- void main()
- {
- char str[256];
- int len;
-
- do {
- printf("\nEnter a string:\n");
- printf("(Exit program with an empty string.)\n");
- gets(str);
- len = strlen(str);
- if (len > 0) {
- bubble(str, len);
- printf("Sorted list:\n%s\n", str);
- }
- } while (len > 0);
- }
-
- To illustrate how the Bubble sort works, here are the passes
- used to sort 'dcab'.
-
- initial: d c a b
- pass l: a d c b ('a' moved to the leftmost)
- pass 2: a b d c ('b' moved to the left)
- pass 3: a b c d ('c' exchanged with 'd')
-
- When analyzing any sort, you must determine how many
- comparisons and exchanges will be performed for the best,
- average, and worst case. With the Bubble sort, the number of
- comparisons is always the same because the two for loops will
- still repeat the specified number of times, whether or not the
- list is initially ordered. This means that the Bubble sort will
- always perform (n^2-n)/2 comparisons, where n is the number of
- elements to be sorted. (There are 6 comparisons in this example.)
- The number of exchanges is 0 for the best case - an already
- sorted list. The numbers are 3*(n^2-n)/4 for the average case and
- 3*(n^2-n)/2 for the worst case. It is beyond the scope of this
- course to explain the derivation of these cases, but you can see
- that as the list becomes less ordered, the number of elements
- that are out of order approaches the number of comparisons.
- (Remember, there are three exchanges in a Bubble sort for every
- element out of order.) The Bubble sort is said to be an n-squared
- algorithm because its execution time is a multiple of the square
- of the number of elements. A Bubble sort is bad for a large
- number of elements because execution time is directly related to
- the number of comparisons and exchanges.
-
- For example, if you ignore the time it takes to exchange any
- out-of-position element, and if each comparison takes 0.001
- seconds, then sorting 10 elements will take about 0.05 seconds,
- sorting 100 elements will take about 5 seconds, and sorting 1000
- elements will take about 500 seconds. A 100,000 element sort (the
- size of a small telephone book) would take about 5,000,000
- seconds, or about 1400 hours - about two months of continuous
- sorting!
-
- Sorting by Selection
- ~~~~~~~~~~~~~~~~~~~~
- A Selectwn sort selects the element with the lowest value
- and exchanges that with the first element. Then from the remain-
- ing (n-1) elements, the element with the least key is found and
- exchanged with the second element, and so forth, up to the last
- two elements. For example, if the selection method were to be
- used on the array 'bdac', each pass would look like this:
-
- initial: b d a c
- pass 1: a d b c ('a' was selected)
- pass 2: a b d c ('b' was selected)
- pass 3: a b c d ('c' was selected)
-
- The basic Selection sort is shown here:
-
- /* selection sort */
- void select(char item[], int count)
- {
- int i, j, k;
- char t;
-
- for (i = 0; i < count-1; ++i) {
- k = i;
- t = item[i];
- for (j = i+1; j < count; ++j) {
- if(item[j] < t) {
- k = j;
- t = item[j];
- }
- }
- item[k] = item[i];
- item[i] = t;
- }
- }
-
- Unfortunately, the number of comparisons in the Select sort
- is the same as the Bubble sort. This means that the Selection
- sort requires (n^2-n)/2 comparisons, which makes it too slow for
- a large number of items. The number of exchanges for the best
- case is 3*(n-1) and for the worst case is n^2/4 + 3*(n-1).
-
- For the best case, if the list is ordered, then only n-1
- elements need to be moved, and each move requires three
- exchanges. The worst case approximates the number of comparisons.
- Although the average case is beyond the scope of this course to
- develop. It is n*(ln(n)+e) where e is Euler's constant (about
- 0.577216). This means that although the number of comparisons for
- both the Bubble sort and the Selection sort is the same, the
- number of exchanges in the average case is far less for the
- Selection sort.
-
- Sorting by Insertion
- ~~~~~~~~~~~~~~~~~~~~
- The Insertion sort is the last of the simple sorting
- algorithms. The Insertion sort initially sorts the first two
- members of the array. Next, the algorithm inserts the third
- member into its sorted position in relation to the first two
- members. Then, the fourth element is inserted into the list of
- three elements. The process continues until all elements have
- been sorted. For example, given the array 'dcab', each pass of
- the Insertion sort would look like this:
-
- initial: d c a b
- pass 1: c d a b ('c' & 'd' were exchanged)
- pass 3: a c d b ('a' moved to the leftmost)
- pass 4: a b c d ('b' inserted)
-
- A version of the Insertion sort is shown here:
-
- /* sorting by straight insertion */
- void insert(char item[], int count)
- {
- int i, k;
- char t;
-
- for (i = 1; i < count; ++i) {
- t = item[i];
- k = i-1;
- while (k >= 0 && t < item[k]) {
- item[k+1] = item[k];
- k--;
- }
- item[k+1] = t;
- }
- }
-
- Unlike the Bubble sort and the Selection sort, the number of
- comparisons that occur while the Insertion sort is used will
- depend on how the list is initially ordered. If the list is in
- order, then the number of comparisons is (n-1). If the list is in
- inverse order, then the number of comparisons is (n^2+n)/2-1,
- while its average number of comparisons is (n^2+n-2)/4.
-
- The number of exchanges for each case is as follows:
-
- best 2*(n-1)
- average (n^2+9*n-10)/4
- worst (n^2+3*n-4)/2
-
- Therefore, the number for the worst case is as bad as those
- for the Bubble and Selection sorts, and for the average case it
- is only slightly better.
-
- The Insertion sort does have two advantages, however. First,
- it behaves naturally: it works the least when the array is
- already sorted and the hardest when the array is sorted in
- inverse order. This makes the Insertion sort useful for lists
- that are almost in order. Second, it leaves the order of equal
- keys unchanged: if a list is sorted using two keys, then it
- remains sorted for both keys after an Insertion sort.
-
- Even though the comparisons may be fairly good for certain
- sets of data, the fact that the array must always be shifted over
- each time an element is placed in its proper location means that
- the number of moves can be very significant. However, the
- Insertion sort still behaves naturally, with the least exchanges
- occurring for an almost sorted list and the most exchanges for an
- inversely ordered array.
-
- Improved Sorts
- ==============
-
- All of the algorithms thus far had the fatal flaw of
- executing in n^2 time. For large amounts of data, the sorts would
- be slow - in fact, at some point, too slow to use. Every computer
- programmer has heard, or told, the horror story of the "sort that
- took three days." Unfortunately, these stories are often true.
-
- When a sort takes too long, it may be the fault of the
- underlying algorithm. However, a sad commentary is that the first
- response is often "write it in assembly code." Although assembler
- code will sometimes speed up a routine by a constant factor, if
- the underlying algorithm is bad, the sort will be slow no matter
- how optimal the coding. Remember, when the run time of a routine
- is relative to n^2, increasing the speed of the coding or of the
- computer will only cause a slight improvement because the rate at
- which the run time increases changes exponentially. Keep in mind
- that if something is not fast enough in C code, then it will
- generally not be fast enough in assembler. The solution is to use
- a better sorting algorithm.
-
- In this section, two excellent sorts are developed. The first
- is the Shell sort, and the second is the Quicksort, which is
- generally considered the best sorting routine. These sorts run so
- fast that if you blink, you will miss them.
-
- The Shell Sort
- ~~~~~~~~~~~~~~
- The Shell sort is named after its inventor, D.L. Shell.
- However, the name seems to have stuck because its method of
- operation resembles sea shells piled upon one another.
-
- The general method, derived from the Insertion sort, is based
- on diminishing increments. Below gives a diagram of a Shell sort
- on the array 'fdacbe'. First, all elements that are three positions
- apart are sorted. Then all elements that are two positions apart
- are sorted. Finally, all those adjacent to each other are sorted.
-
- initial: f d a c b e
- ^--------^ exchange
- ^--------^ exchange
- ^--------^ no exchange
-
- pass1: c b a f d e
- ^-----^ exchange
- ^-----^ no exchange
- ^-----^ no exchange
- ^-----^ exchange
-
- pass2: a b c e d f
- ^--^ exchange
-
- pass3: a b c d e f
-
- It may not be obvious that this method yields good results,
- or even that it will sort the array, but it does both. This
- algorithm is efficient because each sorting pass involves
- relatively few elements, or elements that are already in
- reasonable order; therefore each pass increases the order of the
- data.
-
- The exact sequence for the increments can be changed. The
- only rule is that the last increment must be 1. For example, the
- sequence 9, 5, 3, 2, 1 works well and is used in the Shell sort
- shown here. Avoid sequences that are powers of 2 because, for
- mathematically complex reasons, they reduce the efficiency of the
- sorting algorithm. (However, even if you use them, the sort would
- still work.)
-
- /* a shell sort */
- void shell(char item[], int count)
- {
- int i, j, k, s, w;
- char x, a[5];
-
- a[0] = 9; a[1] = 5; a[2] = 3; a[3] = 2; a[4] = 1;
- for (w = 0; w < 5; w++) {
- k = a[w]; s = -k;
- for (i = k; i < count; ++i) {
- x = item[i];
- j = i - k;
- if (s == 0) {
- s = -k;
- s++;
- item[s] = x;
- }
- while (x < item[j] && j >= 0 && j <= count) {
- item[j+k] = item[j];
- j -= k;
- }
- item[j+k] = x;
- }
- }
- }
-
- You may have noticed that the inner while loop has three test
- conditions. The 'x < item[j]' is a comparison necessary for the
- sorting process. The tests 'j >= 0' and 'j <= count' are used to
- keep the sort from overrunning the boundary of the array 'item'.
- These extra checks degrade the performance of Shell sort to some
- extent. Slightly different versions of the Shell sort employ spe-
- cial array elements, called sentinels, which are not actually
- part of the array to be sorted. Sentinels hold special
- termination values that indicate the least and greatest possible
- elements. In this way the boundary checks are unnecessary.
- However, using sentinels requires a specific knowledge of the
- data, which limits the generality of the sort function.
-
- The analysis of the Shell sort presents some very difficult
- mathematical problems that are far beyond the scope of this text.
- However, execution time is proportional to n^l.2 for sorting n
- elements. This is a very significant improvement over the n^2
- sorts of the previous section. However, before you decide to use
- the Shell sort, you should know that the Quicksort is even
- hetter.
-
-
- The Quicksort
- ~~~~~~~~~~~~~
- The Quicksort, invented and named by C.A.R. Hoare, is
- generally considered the best sorting algorithm currently
- available. It is based on the exchange method of sorting. This is
- surprising if you consider the terrible performance of the Bubble
- sort, which is also based on the exchange method.
-
- The Quicksort is built on the idea of partitions. The general
- procedure is to select a value (called the comparand) and then to
- partition the array into two parts with all elements greater than
- or equal to the partition value on one side and those less than
- the partition value on the other. This process is then repeated
- for each remaining part until the array is sorted. For example,
- given the array 'fedacb' and using the value 'd', the first pass
- of the Quicksort would rearrange the array like this:
-
- initial: f e d | a c b
- ^------------ the comparand
- pass 1: b c a | d e f
-
- This process is then repeated for each half ('bca' and
- 'def'). The process is essentially recursive; indeed, the
- cleanest implementations of Quicksort are recursive algorithms.
-
- The selection of the middle comparand value can be
- accomplished two ways. The value can be chosen either at random
- or by averaging a small set of values taken from the array. For
- the optimal sorting it is desirable to select a value that is
- precisely in the middle of the range of values. However, this is
- not easy to do for most sets of data. Even in the worst case - the
- value chosen is at one extremity - Quicksort still performs well.
-
- The following version of Quicksort selects the middle element
- of the array. Although this may not always result in a good
- choice, the sort still performs correctly.
-
- /* quicksort set up */
- void quick(char item[], int count)
- {
- qs(item, 0, count-1);
- }
-
- /* quicksort */
- void qs(char item[], int left, int right)
- {
- int i, j;
- char x, y;
-
- i = left; j = right;
- x = item[(left+right)/2];
-
- do {
- while (item[i] < x && i < right) i++;
- while (x < item[j] && j > left) j--;
- if (i <= j) {
- y = item[i];
- item[i] = item[j];
- item[j] = y;
- i++; j--;
- }
- } while (i <= j);
- if (left < j) qs(item, left, j);
- if (i < right) qs(item, i, right);
- }
-
-
- In this version, the function quick() sets up a call to the
- main sorting function, called qs(). While this maintains the same
- common interface of 'item' and 'count', it is not essential
- because qs() could have been called directly using three
- arguments.
-
- The derivation of the number of comparisons and number of
- exchanges that Quicksort performs requires mathematics beyond the
- scope of this text. However, you can assume that the number of
- comparisons is n*log(n) and that the number of exchanges is
- approximately n*log(n)/6. These are significantly better than any
- of the previous sorts discussed so far.
-
- The equation
- N = 10^x
-
- can be rewritten as
- x = log(N)
-
- This means, for example, that if 100 elements were to be
- sorted, Quicksort would require 100 * 2, or 200, comparisons
- because log 100 is 2. Compared with the Bubble sort's 4950
- comparisons, this number is quite good.
-
- You should be aware of one particularly nasty aspect to
- Quicksort. If the comparand value for each partition happens to
- be the largest value, then Quicksort degenerates into "slowsort"
- with an n^2 run time. Generally, however, this does not happen.
-
- You must carefully choose a method of determining the value
- of the comparand. Often the value is determined by the actual
- data you are sorting. In large mailing lists where the sorting is
- often by ZIP code, the selection is simple, because the ZIP codes
- are fairly evenly distributed and a simple algebraic function can
- determine a suitable comparand. However, in certain databases,
- the sort keys may be so close in value (with many being the same
- value) that a random selection is often the best method
- available. A common and fairly effective method is to sample
- three elements from a partition and take the middle value.
-
- Generally, the Quicksort is the sort of choice because it is
- so fast. However, when only very small lists of data are to be
- sorted (less than 100), the overhead created by Quicksort's
- recursive calls may offset the benefits of a superior algorithm.
- In rare cases like this, one of the simpler sorts (perhaps even
- the Bubble sort) will be quicker.
-
-
- Sorting Other Data Structures
- =============================
-
- Until now, you have only been sorting arrays of characters.
- This has made it easy to present each of the sorting routines.
- Obviously, arrays of any of the built-in data types can be sorted
- simply by changing the data types of the parameters and variables
- to the sort function. However, generally complex data types like
- strings, or groupings of information like structures, need to be
- sorted. Most sorting involves a key and information linked to
- that key. To adapt the algorithms to sort other structures, you
- need to alter either the comparison section or the exchange
- section, or both. The algorithm itself will remain unchanged.
-
- Because Quicksort is one of the best general-purpose routines
- available at this time, it will be used in the following
- examples. The same techniques will apply to any of the sorts
- described earlier.
-
- Sorting strings
- ~~~~~~~~~~~~~~~
- The easiest way to sort strings is to create an array of
- character pointers to those strings. This allows you to maintain
- easy indexing and keeps the basic Quicksort algorithm unchanged.
- The string version of Quicksort shown here will accept an array
- of char pointers that point to the strings to be sorted. The sort
- rearranges the pointers to the strings, not the actual strings in
- memory. This version sorts the strings in alphabetical order.
-
- /* quick sort for strings setup */
- void quick_string(char *item[], int count)
- {
- qs_string(item, 0, count-1);
- }
-
- /* quick sort for strings */
- void qs_string(char *item[], int left, int right)
- {
- int i, j;
- char *x, *y;
-
- i = left; j = right;
- x = item[(left+right)/2];
- do {
- while (strcmp(item[i], x) < 0 && i < right) i++;
- while (strcmp(item[j], x) > 0 && j > left) j--;
- if (i <= j) {
- y = item[i];
- item[i] = item[j];
- item[j] = y;
- }
- } while (i <= j);
- if (left < j) qs_tring(item, left, j);
- if (i < right) qs_string(item, i, right);
- }
-
- The comparison step has been changed to use the function
- strcmp(), which returns a negative number if the first string is
- lexicographically less than the second, 0 if the strings are
- equal, and a positive number if the first string is
- lexicographically greater than the second. The exchange part of
- the routine has been left unchanged because only the pointers are
- being exchanged - not the actual strings. To exchange the actual
- strings, you would have to use the function strcpy().
-
- The use of strcmp() will slow down the sort for two reasons.
- First, it involves a function call, which always takes time;
- second, the strcmp() function itself performs several comparisons
- to determine the relationship of the two strings. If speed is
- absolutely critical, the code for strcmp() can be duplicated
- in-line inside the routine. However, there is no way to avoid
- comparing the strings, since this is by definition what the task
- involves.
-
- Sorting Structures
- ~~~~~~~~~~~~~~~~~~
- Most application programs that require a sort will need to
- have a grouping of data sorted. A mailing list is an excellent
- example because a name, street, city, state, and ZIP code are all
- linked together. When this conglomerate unit of data is sorted, a
- sort key is used, but the entire structure is exchanged. To see
- how this is done, you first need to create a structure. For the
- mailing-list example, a convenient structure is
-
- struct address {
- char name[40];
- char street[40];
- char city[20];
- char state[3];
- char zip[10];
- };
-
- The 'state' is three characters long and 'zip' is ten
- characters long because a string array always needs to be one
- character longer than the maximum length of any string in order
- to store the null terminator.
-
- Since it is reasonable to arrange a mailing list as an array
- of structures, assume for this example that the sort routine
- sorts an array of structures of type 'address', as shown here.
-
- void quick_struct(struct address [], int);
- void qs_struct(struct address [], int, int);
-
- /* quick sort for structures setup */
- void quick_struct(struct address item[], int count)
- {
- qs_struct(item, 0, count-1);
- }
-
- /* quick sort for structures */
- void qs_struct(struct address item[], int left, int right)
- {
- int i, j;
- char *x, *y;
-
- i = left; j = right;
- x = item[(left+right)/2].zip;
- do {
- while (strcmp(item[i].zip, x) < 0 && i < right) i++;
- while (strcmp(item[j].zip, x) > 0 && j > left) j--;
- if (i <= j) {
- swap_all_fields(item, i, j);
- i++; j--;
- }
- } while (i <= j);
- if (left < j) qs_struct(item, left, j);
- if (i < right) qs_struct(item, i, right);
- }
-
- Notice that both the comparison code and the exchange code
- needed to be altered. Because so many fields needed to be
- exchanged, a separate function, swap_all_fields(), was created to
- do this. You will need to create swap_ all_fields() in accordance
- with the nature of the structure being sorted.
-
-
-
- Searching Methods
- =================
-
- Databases of information exist so that, from time to time, a
- user can locate a record by knowing its key. There is only one
- method of finding information in an unsorted file or array, and
- another for a sorted file or array. Many compilers supply
- search functions as part of the standard library. However, as
- with sorting, general-purpose routines sometimes are simply too
- inefficient for use in demanding situations because of the extra
- overhead created by the generalization.
-
- Finding information in an unsorted array requires a
- sequential search starting at the first element and stopping
- either when a match is found or when the end of the array is
- reached. This method must be used on unsorted data, but can also
- be applied to sorted data. If the data has been sorted, then a
- binary search can be used, which greatly speeds up any search.
-
- The Sequential Search
- ~~~~~~~~~~~~~~~~~~~~~
- The sequential search is easy to code. The following function
- searches a character array of known length until a match is found
- with the specified key.
-
-
- int sequential_search(char item[], int count, char key)
- {
- int i;
- for (i = 0; i < count; ++i)
- if (key == item[i]) return (i);
- return (-1); /* no match */
- }
-
- This function will return the index number of the matching
- entry if there is one, or -1 if there is not.
-
- A straight sequential search will, on the average, test n/2
- elements. In the best case, it will test only one element and in
- the worst case n elements. If the information is stored on disk,
- the search time can be very long. But if the data is unsorted,
- this is the only method available.
-
- The Binary Search
- ~~~~~~~~~~~~~~~~~
- If the data to be searched is in sorted order, then a
- superior method, called the binary search, can be used to find a
- match. The method uses the "divide-and-conquer" approach. It
- first tests the middle element; if the element is larger than the
- key, it then tests the middle element of the first half;
- otherwise, it tests the middle element of the second half This
- process is repeated until either a match is found, or there are
- no more elements to test.
-
- For example, to find the number 4 in the array
-
- 1 2 3 4 5 6 7 8 9
-
- the binary search would first test the middle, which is 5.
- Since this is greater than 4, the search would continue with the
- first half, or
-
- 1 2 3 4 5
-
- In this example, the middle element is 3. This is less than
- 4, so the first half is discarded and the search continues with
-
- 4 5
-
- This time the match is found.
-
- In the binary search, the number of comparisons given the
- worst case is log2(n). With average cases, the number is somewhat
- better; in the best case, the number is 1.
-
- A binary search function for character arrays is shown here.
- You can make this search any arbitrary data structure by changing
- the comparison portion of the routine.
-
- /* Binary search */
- int binary(char item[], int count, char key)
- {
- int low, high, mid;
-
- low = 0; high = count-1;
- while(low <= high) {
- mid = (low+high) / 2;
- if (key < item[mid]) high = mid - 1;
- else if( key > item[mid]) low = mid + 1;
- else return (mid); /* found */
- }
- return (-1);